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Abstract 



Motivated by multi-task machine learning with Banach spaces, we propose the notion of vector- 
valued reproducing kernel Banach spaces (RKBS). Basic properties of the spaces and the associated 
' reproducing kernels are investigated. We also present feature map constructions and several con- 

ry | ■ crete examples of vector-valued RKBS. The theory is then applied to multi-task machine learning. 

Especially, the representer theorem and characterization equations for the minimizer of regularized 
learning schemes in vector-valued RKBS are established. 
Keywords: vector-valued reproducing kernel Banach spaces, feature maps, regularized learning, 
the representer theorem, characterization equations. 



^ ■ 1 Introduction 

t>: 

The purpose of this paper is to establish the notion of vector-valued reproducing kernel Banach spaces 
and demonstrate its applications to multi-task machine learning. Built on the theory of scalar- valued 
reproducing kernel Hilbert spaces (RKHS) [3j, kernel methods have been proven successful in single 
task machine learning |1U[ IT4"1 |2TJ[ I3U] [33]. Multi-task learning where the unknown target function to 
be learned from hnite sample data is vector-valued appears more often in practice. References [13^ [25] 
proposed the development of kernel methods for learning multiple related tasks simultaneously. The 
mathematical foundation used there was the theory of vector- valued RKHS [5j |27] . Recent progresses 
^ ■ in vector- valued RKHS can be found in [5J [S] . In such a framework, both the space of the candidate 

functions used for approximation and the output space are chosen as a Hilbert space. 

There are some occasions where it might be desirable to select the space of candidate functions, 
the output space, or both as Banach spaces. Hilbert spaces constitute a special and limited class 
of Banach spaces. Any two Hilbert spaces over a common number field with the same dimension 
are isometrically isomorphic. By reaching out to other Banach spaces, one obtains more variety in 
geometric structures and norms that are potentially useful for learning and approximation. Moreover, 
training data might come with intrinsic structures that make them impossible or inappropriate to 
be embedded into a Hilbert space. Learning schemes based on features in a Hilbert space may not 
work well for them. Finally, in some applications, a Banach space norm is engaged for some particular 
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purpose. A typical example is the linear programming regularization in coefficient based regularization 
for machine learning [29], where the l\ norm is employed to obtain sparsity in the resulting minimizer. 

There have been considerable work in learning a single task with Banach spaces (see, for example, 
[H El [121 [El HZl [SOI ISH [261 EH ESI SI])- The difficulty in mapping patterns into a Banach space 
and making use of these features for learning mainly lies in the lack of an inner product in Banach 
spaces. In particular, without an appropriate correspondence of the Riesz representation of continuous 
linear functionals, point evaluations do not have a kernel representation in these studies. Semi-inner 
products, a mathematical tool discovered by Lumer [23] for the purpose of extending Hilbert space 
type arguments to Banach spaces, seem to be a natural substitute for inner products in Banach spaces. 
An illustrative example is that we were able to extend the classical theory of frames and Riesz bases to 
Banach spaces via semi-inner products [38]. Semi-inner products were first used to machine learning 
by Der and Lee [12] for the study of large margin classification by hyperplanes in a Banach space. 
With this tool, we established the notion of scalar-valued reproducing kernel Banach spaces (RKBS) 
and investigated regularized learning schemes in RKBS [361 137| . There has been increasing interest in 
the application of this new theory [30l [191 Ell E2] . 

We attempt to build a mathematical foundation for multi-task learning with Banach spaces. Specif- 
ically, we shall propose a definition of vector-valued RKBS and investigate its fundamental properties 
in the next section. Feature map representations and several concrete examples of vector- valued RKBS 
will be presented in Sections 3 and 4, respectively. In Section 5, we investigate regularized learning 
schemes in vector- valued RKBS. 



2 Definition and Basic Properties 

We are concerned with spaces of functions from a fixed set to a vector space. We shall allow the space 
of functions and the range space both to be a Banach space. Our key tool in dealing with a general 
Banach space is the semi-inner product [161 [23] . Recall that a semi- inner product on a Banach space 
V is a function from V x V to C, denoted by [•, -]v, such that for all u,v,w £ V and a, f3 £ C 

1. (linearity with respect to the first variable) [af + f3g, h]y = a[f, h]y + f3[g, h]v; 

2. (positivity) [/,/] v >0for/^0; 

3. (conjugate homogeneity with respect to the second variable) [f,ag]v = ct[f,g]v", 

1/2 1/2 

4. (Cauchy-Schwartz inequality) \ [f,g]v\ < [f,f\v [9,9\v ■ 
A semi- inner product [•, -]y on V is said to be compatible if 

[/,/]l/ 2 = imiv far all /GV, 

where || • ||y denotes the norm on V. Every Banach space has a compatible semi-inner product |16[l23j. 
Let [■, -]v be a compatible semi-inner product on V. Then one sees by the Cauchy-Schwartz inequality 
that for each / G B, the linear functional /* on V defined by 

r(g)-=[g,f}v, g^v (2.1) 

is bounded on V. In other words, /* lies in the dual space B* of B. Moreover, we have 

ii/iv* = \\f\\v (2.2) 
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and 

/*(/) = ii/iMirik*. (2.3) 

Introduce the duality mapping Jy from V to V* by setting 

Jv(f):=f*, feV. 

We desire to represent the continuous linear functionals on the vector-valued RKBS to be intro- 
duced by the semi-inner product. However, the semi-inner product might not be able to fulfill this 
important role for an arbitrary Banach space. For instance, one verifies that the continuous linear 
functional 

M5):=E(-l) J iQ-), S6O([0,l])- 
on C([0, 1]) endowed with the usual maximum norm can not be represented as 

M<?) = b,/], <?ec([o,i]) 

for any compatible semi-inner product [•, •] on C([0, 1]) and any / G C([0, 1]). 

The above example indicates that the duality mapping might not be surjective for a general Banach 
space. Other problems such as non-uniqueness of compatible semi-inner products and non-injectivity 
of the duality mapping may also occur. To overcome these difficulties, we shall focus on Banach 
spaces that are uniformly convex and uniformly Frechet differentiable in this preliminary work on 
vector- valued RKBS. A Banach space V is uniformly convex if for all e > there exists a 5 > such 
that 

11/ + g\\v < 2 - 5 for all /, g G V with ||/||y = \\g\\ v = 1 and ||/ - g\\ v > e. 

Uniform convexity ensures the injectivity of the duality mapping and the existence and uniqueness of 
the best approximation to a closed convex subset of V |16j . We also say that V is uniformly Frechet 
differentiable if for all /, g G V 

Km + (2.4) 

teiM->o t 

exists and the limit is approached uniformly for all /, g in the unit ball of V. If V is uniformly Frechet 
differentiable then it has a unique compatible semi-inner product [16] . The differentiability (|2.4p of the 
norm is useful to derive characterization equations for the minimizer of regularized learning schemes 
in Banach spaces. For simplicity, we call a Banach space uniform if it is both uniformly convex and 
uniformly Frechet differentiable. An analogue of the Riesz representation theorem holds for uniform 
Banach spaces. 

Lemma 2.1 (Giles 116)1 ) Let V be a uniform Banach space. Then it has a unique compatible semi- 
inner product [-,-]v an d the duality mapping Jy is bijective from V to V* . In other words, for each 
fj, G V* there exists a unique f G V such that 

Kg) = [g,f]v for all g G V. 

In this case, 

[f,g*]B* -=[gJ]B, f,geB (2.5) 

defines a compatible semi-inner product on B*. 
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Let V be a uniform Banach space. We shall always denote by [•, -]y the unique compatible semi- 
inner product on V. By Lemma [2. II and equation (|2.2p . the duality mapping is bijective and isometric 
from V to V*. It is also conjugate homogeneous by property 3 of semi-inner products. However, it is 
non-additive unless V reduces to a Hilbert space. As a consequence, a compatible semi-inner product 
is in general conjugate homogeneous but non-additive with respect to its second variable. Namely, 

[f,9 + h]v^[f,9}v + [f,h}v 

in general. 

We are ready to present the definition of vector- valued RKBS. Let A be a Banach space which 
we shall sometimes call the output space and X be a prescribed set which is usually called the input 
space. A space B is called a Banach space of A-valued functions on X if it consists of certain functions 
from X to A and the norm on B is compatible with point evaluations in the sense that 

\\f\\ B = if and only if f{x) = for all 

For instance, L p ([0, 1]), p > 1 is not a Banach space of functions while C([0, 1]) is. We restrict our 
consideration to Banach spaces of functions so that point evaluations (usually referred to as "sampling" 
in applications) are well-defined. 

Definition 2.2 We call B a A-valued RKBS on X if both B and A are uniform and B is a Banach 
space of functions from X to A such that for every x G X , the point evaluation S x : B — > A defined by 

5 x (f)-= /(*), feB 

is continuous from B to A. 

We shall derive a reproducing kernel for so defined a vector- valued RKBS. Throughout the rest 
of the paper, we let [•, -jg and [•, -]a be the unique semi-inner product and Jb and J\ the associated 
duality mapping on B and A, respectively. For two Banach spaces Vi,V2, we denote by Ai(Vi,V2) 
the set of all the bounded operators from V\ to V% and C(V\,V2) the subset of M.(V\,V2) of those 
bounded operators that are also linear. When V\ = V-j, M(Vi, V2) is abbreviated as A4(V\). For each 
T G M.(Vi, V2), we denote by HTH^vijVa) ^ ne g rea test lower bound of all the nonnegative constants 
a such that 

||ru||y 2 < ct 1 1 w 1 1 VI for all u G V\. 

When T is also linear, this quantity equals the operator norm HrH^y^y^ of T in C(V\, V2). In those 
languages, we require that the point evaluation 5 X on a A-valued RKBS on X belong to C(B, A) for 
all x £ X. 

Theorem 2.3 Let B be a A-valued RKBS on X . Then there exists a unique function K from X x X 
to M{A) such that 

(1) K(x, ■)£ G B for allxGX and £ G A, 

(2) for all f eB, x G X, and £ G A 

[/(*), £] A = [f,K(x,-)Z] B , (2.6) 
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(3) for all x, y G X 

\\ K ( x ^y)\\M(A) < ¥x\\c(B,\)\\ 6 y\\c(BA)- ( 2 - 7 ) 
Proof: Let x G X and £ G A. As 5 X G £(£>, A), we see that 

|[/(x),£] A | < ||/(x)||a||£||a < ||<yU(s,A)ll/bl|£||A. (2.8) 

The above inequality together with the linearity of the semi-inner product with respect to its first 
variable implies that 

/^[/(x),£] A 

is a bounded linear functional on B. By Lemma \2.1\ there exists a unique function g x c G B such that 

[f(x)^] A = [f,g x ^] B . (2.9) 

Define a function .ff from X x X to the set of operators from A to A by setting 

K(x, y)£ := g x ,z(y), x, y G X, £ G A. 

Clearly, IT satisfies the two requirements (1) and (2). It is also unique by the uniqueness of the 
function g x £ satisfying (|2.9p . It remains to show that it is bounded. To this end, we get by (|2.8p that 

\\ k (x,-)£\\b= sup \[f,K(x,-)] B \ = sup |[/(x),£]a| < ||4|U(23,a)||£I|a. 

/gS,||/|| b <i /eB,|l/|| e <i 

It follows that 

\\K(x,y)£\\ B < \\$y\\c(B,A)\\ K ( X i')Zh ^ I|4|U(B,A)||^|U(B,A)||C||a, 
which proves (|2.7p . □ 



We call the above function K the reproducing kernel of B. It coincides with the usual reproducing 
kernel when B is a Hilbert space and A = C, and with the vector- valued reproducing kernel when both 
B and A are Hilbert spaces. We explore basic properties of vector-valued RKBS and its reproducing 
kernels for further investigation and applications. 

Let (S x )* be the adjoint operator of 5 X for all x G X. Denote for a Banach space V by (•, -)v the 
bilinear form on V x V* defined by 

(v,n) v •- v G V, n G V*. 

Thus, (S x )* is define by 

(/,(4)T) B = (5(x)(/),r)A = (/(x),r)A = [/(x),£]A, feB, £gA. (2.io) 

Proposition 2.4 Lei B be a A-valued RKBS on X and K its reproducing kernel. Then there holds 
for all x,y G X and £, 77, r G A i/ioi 

[1T(x,x)£,£]a > 0, |[X(x,i/)C,t/] a | < [^(x, x)£, £] A /2 [K(y, y) V , r/] A /2 , (2.11) 

ll-^(x,y)||A4(A) < ll-^(x,x)||^ 2 (A) \\K(y,y)\\ l ^ K y (2.12) 
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K(x,-)^ = J^ 1 (d x )*J A (0, (2.13) 

K(x, y)(a£) = aK(x, y)£ for all aeC, (2.14) 

iieiu, (2.i5) 

(if (a;, •)£)* + (K(x, -)t))* = (K(x, -)r)* whenever r* = + 77*, (2-16) 

span-Ki^Or, ■)£)* :x£l, £ G A} is dense in B*. (2.17) 



Proo/: By J25 



£] A = •)£, = \\K(x, > 0, (2.18) 

which proves the first inequality in equation (|2. 1 1 [) . For the second one, we use the Cauchy-Schwartz 
inequality of semi-inner products to get that 

\[K(x,y% V ] A \ = \[K(x,-)Z,K(y,-)r,] B \ < [K(x, ■)£, K(x, -)^ 2 [K{y, -)r,, K(y, 
= \K{x,x)t,£\ 1 l 2 [K(y,y)r ] ,r ] \ 1 l 2 . 

It follows from (j2TPil that 

\[K(x,y)^,r,} A \ < \\K(x,xnTU\\T\\K(y^H\TMA 2 < 11*0^)11^) \\K(y, y)||^ 2 (A) U\\aH\a- 
Since y)£|| A — sup{| [K(x, 77] a | : 77 G A, \\t]\\a = 1}, we have by the above equation that 

\\K(x,y)Ch < ll^^)ll^ 2 ( A)ll^y)llM(A)ll^lA> 

which proves fl2. 12H . 

Turning to (|2.13p , we notice for each f £ B that 

[fiJ^^YMChB = (/, (SxTMOh = (4(/U*)a = (/W.Da = [/(*U]a, 

which together with (|2.6p confirms (|2.13p . Since the duality mappings are conjugate homogeneous, 
we have by (|2.13p that 

K(x, -)K) = Jj 1 (4)* JaK) = aJg 1 ($*)* MO = aK(x, ■)£, 
which implies ()2.14p . 

Recall that the duality mappings Js and J A are isometric. Note also that a bounded linear 
operator and its adjoint have equal operator norms. Using these two facts, we obtain from equation 
(gJlD that 

\\K{x,-)i\\s < ||(<y*|k(A*,B*)||£||A = ||**|k(B,A)||£||A, 

which is the first inequality in (I2.15p . The second one follows immediately from (I2.18p . 
Let £, 77, r G A be such that r* = + ??*• By (f2TT3l) . 

(K(x, -)0* + (K(x, •)/?)* = (5 x )*e + (5 X )* V * = (S x y(C + r?*) = (S x yr* = (K(x, Or)*. 

Equation (|2.16p hence holds true. 

For the last property, let us assume that there exists some / G B that vanishes on span {(K(x, •)£)* • 
x G X, £ G A}. Then 

[/(*U] A = [f,K(x,-)C} B = (f,(K(x,-)0*)B = for all x G X, £ G A, 
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which implies that f{x) = for all x £ X. As B is a Banach space of functions, / = as a vector in 
the Banach space B. Therefore, (|2.17p is true. The proof is complete. □ 



We observe by the above proposition that the reproducing kernel of a vector-valued RKBS enjoys 
many properties similar to those of the reproducing kernel of a vector-valued RKHS. However, there 
are many significant differences due to the nature of a semi- inner product. Firstly, although for all 
x, y G X, K(x, y) remains a homogeneous bounded operator on A, it is generally non-additive. This 
can be seen from (j2. 13|) . where J'a or J^ 1 is non-additive. Secondly, it is well-known that when A is 
a Hilbert space, a function K : X x X — > £(A) is the reproducing kernel of some A-valued RKHS on 
X if and only if for all finite £j G A and pairwise distinct Xj G X, j = 1,2, ... ,m, 

m m 

X^[#(Wfc)&>£fc]A >0. (2.19) 
j=i k=i 

Although (|2.19[) still holds for the reproducing kernel of a vector-valued RKBS when m < 2 and the 
number field is M., it may cease to be true once the number of sampling points m exceeds 2. An 
example will be constructed in the next section. Finally, the denseness property (|2.17p in the dual 
space B* does not necessarily imply that 

span {K(x, •)£ : x G X, £ G A} = B. (2.20) 

A negative example will also be given in the next section after we present a construction of vector- 
valued RKBS through feature maps. Before that, we present another important property of a vector- 
valued RKBS. 

Proposition 2.5 Let B be a A-valued RKBS on X. Suppose that f n G B, n G N converges to some 
/o G B then f n (x) converges to fo(x) in the topology of A for each x G X. The convergence is uniform 
on the set where \\K(x, a;)||x(A) * s bounded. 

Proof: Suppose that ||/ n — f\\js converges as n tends to infinity. We get by (|2.15p that 

\\f n {x)-f{x)\\ K = sup \[fn(x)-f(x),£] A \ 
CeA,||€||A=i 

= sup \[f n -f,K(x,-)£] B \< sup \\fn-f\\ B \\K(x,-)Z\\ B 

eeA,nen A =i eeA,[|$i| A =i 
< ||/ n -/|| B ||K(x,x)||^ 2 (A) . 

Therefore, f n (x) converges pointwise to f{x) on X and the convergence is uniform on the set where 
1 1 K (x, x) || m(A) is bounded. □ 



3 Feature Map Representations 

Feature map representations form the most important way of expressing reproducing kernels. To 
introduce feature maps for the reproducing kernel of a vector- valued RKBS, we need the notion of 
the generalized adjoint [22] of a bounded linear operator between Banach spaces. Let Vi,V2 be two 
uniform Banach spaces with the compatible semi- inner products [•, -]vi and [v]y 2 , respectively. The 
generalized adjoint of a T G C(V\, V2) is an operator in A4(V2, V\) defined by 

[Tu,v]v 2 = [u,T^v]v!, u G Vi, v £ V 2 . 
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It can be identified that 

= Jy i 1 T*J V2 . 

Thus, is indeed bounded as 

ll rt ll-M(v 2 ,v\) = \\ T *\\c(v;,v*) = \\ T \\c(Vi,v 2 )- 
We are in a position to present a characterization of the reproducing kernel of a vector- valued RKBS. 

Theorem 3.1 A function K : X x X — > M(A) is the reproducing kernel of some A-valued RKBS on 
X if and only if there exists a uniform Banach space W and a mapping $ : X — > £(W, A) such that 

K(x,y) = *(y)&(x), x,y € X, (3.1) 

and 

sptt:{(&(x)Z)* :xeX, £g A}=W*. (3.2) 
Here & is the function from X to M(A,W) defined by &(x) := (<&(x))t, x G X. 

Proof: Suppose that K is the reproducing kernel of some A-valued RKBS B on X. Set W := B and 
define $ : X -> C(W, A) by 

($(x))(/) :=/(x), f€B, xeX. 
To identify <J>t, we observe by the reproducing property fl2.6f> for all £ G A and / G B that 

[f,^(^]B = mx))f^] A = [f(x),^] A = [f,K{x,^] B , xeX, £gA, 

which implies that <3?^(x)£ = If (x, •)£ for all x G X and £ G A. Requirement (|3.2p is fulfilled by (|2,17p . 
By the forms of $ and we obtain that 

*(y)&(x)£ = *(y)(K(x, •)£) = K(x, y)£, 

which proves (13.ip . 

On the other hand, suppose that K is of the form ()3. 1|) in terms of some mapping $ satisfying the 
denseness condition (|3.2p . We shall construct the RKBS that takes -fT as its reproducing kernel. For 
this purpose, we let B be composed of functions from X to A of the following form 

f u (x) := $>(x)u, x £ X for some u G W. 

Since each <3?(x) is a linear operator, B is a linear vector space. We impose a norm on B by setting 

WUWb ■= Hlwi w g w. 

To verify that this is a well-defined norm, it suffices to show that the representer u of a function f u £ B 
is unique. Assume that f u = 0. Then for all x G X and £ G A, 

(u, (<*>t(x)£)*) w = [ n , $t(x)£] w = [$0e)u, 3a = [0, £]a = 0, 

which combined with (13. 2p implies that u = 0. The arguments also show that B is a Banach space of 
functions. Moreover, it is a uniform Banach space as it is isometrically isomorphic to W. Clearly, we 
have for each x £ X and u G W that 

||/uO*OI|a = ||$(x)u||a < ||$(i)||£(w,a)||w||w = ||^(a;)||£(w,A)ll/u||B, 
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which shows that point evaluations are bounded on B. We conclude that B is a A-valued RKBS on 
X. It remains to prove that K is the reproducing kernel of B. To this end, we identify the unique 
compatible semi-inner product on B as 

[fu, fvh ■= K «]w, u, v G W, 
and observe for all u G W and i£l that 

[/„,tf(av)£]s = [/u,*(0* f (^]B = [u,$t(aO£] w = [$(ar)u,e] A = [/«(x),^]a, 

which is what we want. The proof is complete. □ 

We call the Banach space W and the mapping $ in Theorem 13. II a pair of feature space and feature 
map for if, respectively. The proof of Theorem 13.11 contains a construction of vector- valued RKBS by 
feature maps, which we pull out separately as a corollary below. 

Corollary 3.2 Let W be a uniform Banach space and $ : X — > £(W, A) 6e a feature map of K that 
satisfies \3. 1\) and 113. fy) . Then the linear vector space 

B ■= {$(•)« : «eW} 

endowed with the norm 

\\®(-)u\\b := ||it||w, u e W 

and compatible semi-inner product 

zs a A-valued RKBS on X with the reproducing kernel K given by 13. 

As an interesting application of Corollary 13. 2\ we shall show that a vector- valued RKBS is always 
isometrically isomorphic to a scalar-valued RKBS on a different input space. 

Corollary 3.3 If B is a A-valued RKBS on X then the following linear vector space B of complex- 
valued functions f on X := X x A of the form 

/>,£) := [/(*), £] A , xel.^AJeB 

is an RKBS on X with the norm 

whs ■■= \\f\\B, ft® 

and the compatible semi-inner product 

[Lg]j3 '■= [f>9]B, /,seS. 

The reproducing kernel K of B is 

K((x, £), (y, rj)) := [K(x, y)$, rj\ A , x, y G X, f , n G A. 
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Proof: It suffices to point out that B is constructed by Corollary 13.21 via the choices 

A:=C, W:=B, *(x,t) := (K(x, -)0*, M)eX. 

The feature map satisfies the denseness condition by (|2.17p . □ 

We shall next construct by Corollary 13.21 simple vector- valued RKBS to show that the reproducing 
kernel of a general vector-valued RKBS might not satisfy (|2. 19j) or (|2.20p . Let p,q,r,s E (l,+oo) 
satisfy that 

1111 . , 

- + - = - + - = 1. 3.3 

p q r s 

Here, for the sake of convenience in enumerating elements from a finite set, we set N; := {1, 2, . . . , 1} 
for I £ N. For each 7 £ (1, +00) and i 6 N, ^ denotes the Banach space of all vectors u = (uj : j E 
N/) £ C' with the norm 

/ 1 \Vi 

IMI4 : = (X)N 7 ) <+°°- 

The space ily is a uniform Banach space with the compatible semi-inner product 

l — 1 i-v — 2 



Me. ■= Yl V 17-2 ' M ' ue£ r 



The dual element u* of u £ ^ is hence given by 



Non-completeness of the linear span of the reproducing kernel in B. We give a counterex- 
ample of (|2.20p first. Let m, n £ N. We choose the output space A and feature space W as PI and 
respectively. Thus, we have that A* = and W* = tf 1 . The input space will be chosen as a set of 
m discrete points X := {xj : j £ N m }. A feature map <E> : X — > £(W, A) should satisfy the denseness 
condition (|3.2p . We note by the definition of the generalized adjoint that this condition is equivalent 
to 

-span{<rOr)£* : x e X, £ E A} = W*, (3.5) 

where $*(x) := (®{x))* for all x £ X. 

Let us take a close look at equation (12.20p . By Corollary 13.21 a general function in B is of the form 
f u := 3>(-)ii for some u £ W. Equation (|2.2(jp does not hold true if and only if there exists a nontrivial 
u £ W such that 

[#0c, oe, /«]b = m-)&(x)t, *(-)«] B = [$t( x )^ s «] w = 0, 

which in turn is equivalent to that span{<&' : x £ X, £ £ A} is not dense in W. We conclude 
that to construct a A-valued RKBS for which (|2.20p is not true, it suffices to find a feature map 
$ : X -)• £(W, A) that satisfies ([33]) but 

span^t^ : x £ X, £ £ A} C yy. (3.6) 
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To this end, we find a sequence of vectors wj G C m and set 

$*(xj)e ■= (C)iw„ j e N r , 



(3.7) 



where (£*)i is the first component of the vector G C n . Since for each j G N m , $*(xj) is a linear 
operator from A* to W* and both the spaces are finite-dimensional, <&*(ccj) is bounded. We reformulate 
(|3.5p and (|3.6p to get that they are respectively equivalent to 



and 



span {wj : j G N m } = C 



sp&nlJy/wj : j G N rn } g 



(3.8) 
(3.9) 



Here for a vector u = (uj : j G N m ) G C m , we get by (j3.4j) that 



■j I "j I 



is— 2 



: j G N r 



Therefore, the task reduces to the searching ofanmxm nonsingular matrix A that becomes singular 
when we apply the function t — > t\t\ s ~ 2 to each of its components. We find two such matrices as shown 
below 



m = 4, s = 4, Ai :-- 



8 2 4 

5 5 1 

5 4 6 9 

9 4 8 



, and m = 4, s = 5, ^ := 



9 9 9 9 

8 6 2 

6 9 2 1 

7 4 9 9 



Non-positive-definiteness of the reproducing kernel of B. We shall give an example to show 
that (|2.19p might not hold true for the reproducing kernel of a vector-valued RKBS when the number 
m of sampling points exceeds 2. In fact, we let m = 3 and B be constructed as in the above example 
with {wj : j G N3} to be appropriately chosen in the definition (|3.7p of $*. Our purpose is to find 
Wj G C 3 and G A, j G N 3 such that 



3 3 



^2^2[K(xj,x k )£,j,£k]B < 0. 
j=l k=x 



(3.10) 



We first note for all j, k G N3 that 



[K(xj,j; fc )^,^]A = [$(x fc )$ t (x i )^,^] A = [«> t (x i )^,$ t (x fc )efe]A 

= [(^(x fe )^)*,($t( x .)^.)*] A , = [^(x^y^^x^rw. 

We shall choose £j G A so that ((Cj)*)i = 1 for each j G N3. With the choice, we obtain by (|3.7|) and 
the above equation that 

33 33 

j=l k=l j=l k=l 

The conclusion is that for (I3.10P to hold, it suffices to find Wj G C 3 , j G N3 that form a basis for C 3 
but 

3 3 

^2^2[w k ,wj] e 3 < 0. 

j=l k=l 
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Two examples are shown below 



4, [wi,W2,W 3 ] 



4 
3 
1 



-3 
4 
1 



, and s = 5, [wi,W2,w 3 ] 



3 2-3 
2-3 3 
-5 4 



4 Examples of Vector- valued RKBS 

We present several examples of vector- valued RKBS in this section. The first one of them is applicable 
to learning a sensing matrix. 

4.1 The space of sensing matrices 

Spaces involved in this example are all over the field M of real numbers. The input space and output 
space are chosen by X := R d and A := M. n . The vector-valued RKBS B consists of all the n x d real 
matrices. Each A G B is considered to be a function from M. d to W 1 with the point evaluation 

A(x) := Ax, x G R d . 

To find a norm that makes B a uniform Banach space, we first point out that a finite-dimensional 
Banach space V is uniform if and only if its norm is strictly convex. For a proof of this simple 
fact, see, for example, [38]. Recall that || • \\y is said to be strictly convex if for all u, v G V \ {0}, 
1 1 it + uHv = IMIv + \\ v \\v always implies that u = av for some a > 0. Strictly convex norms on B 
include 



column- wise norms: 



\A\ 



G(\\ai\\i, II «2 1| 2 , • • • , hdWd), A £ B, 



(4.1) 



where for each j £ Nd, a* is the j-th column of A and 



is a strictly convex norm on M™, and 



G is a strictly convex function from Ml to 



[0, oo) that is strictly increasing with respect 



to each of its variables and is homogeneous in the sense that 

G(ax) = aG{x) for all iGRJ and a G E + . 

It is straightforward to verify that under the above conditions, (|4.ip is indeed a strictly convex 
norm on B. An explicit instance is 



\A\ 



\aj\\t» :jeN d )\\ i d, A€B, 



where p,r G (1, +oo). One can easily transform a column- wise norm 
by equipping A G B with ||>l" r || J g ) where A T is the transpose of A. 



(4.2) 

\b into a row- wise norm 



the p-th Schatten norm (see, Section 3.5 of [18]): 



min(n,(i) 

i=i 



A £ B, p G (l,+oo), 



where er,-(A) is the j-th singular value of ^4. The p-th Schatten norm belongs to the class of 
matrix norms that are invariant under multiplication by unitary matrices. 



12 



We shall look at the reproducing kernel of B when it is endowed with the norm (|4.2p and the 
output space M n is equipped with the norm of I™ for some 7 G (l,+oo). Let q,s be the conjugate 
number of p and r, respectively. In other words, they satisfy (|3.3j) . We proceed by (|2.6p that 

(Ax,e) ef = [A,K(x,-)Z] B = (A,(K(x,-)t)*)B, AgB, x£i d , 
which implies that 

(A^(x, •)£)* = T^ T , x G M d , £ G K". (4.3) 
The dual element of A G £> is given by 



A* 



i4ir- 2 



a ill a ill^ 2 : J G N<i 



where a* is the dual vector of aj in £™. The reproducing kernel of B can be derived from the above 
two equation. Its explicit form is too complicated to be presented. We shall see from the study of 
regularized learning schemes in vector- valued RKBS that the identification (|4.3p of its dual is usually 
more important. 



4.2 Tensor products of scalar-valued RKBS 

Let n G N and Bj, j G N n be scalar- valued RKBS on an input space X. We let B be the tensor 
product of Bj, j G N n . Thus, it consists of C n -valued functions of the form / = (fj G Bj : j G N n ). 
To define a norm on B, we choose functions N,N* from to K+ that are strictly convex, strictly 
increasing with respect to each of the variables, homogeneous, and satisfy that x — > J\[*(\x\) is the 
dual norm of x — > M(\x\) on M n . Here, \x\ := (\xj\ : j G N n ) for each x G W 1 . An example is 

Mix) := \\x\\ e n, M*(x) := \\x\\ e n, x G M+, 

where p, q are a pair of conjugate numbers in (1, +00). With such two gauge functions, we impose the 
following norm on B 

II/IIb := mflh,, IIMIft, • • • , H/nkJ, / G £>. (4.4) 

Proposition 4.1 T/ie tensor product space B with the norm {j4-4\ ) is a uniform Banach space. 

Proof: We first show that (I4.4p defines a uniform convex norm on B. It is straightforward to verify 
that it is a norm. Let e be a fixed positive number and f,g £ B be such that ||/||# = \\g\\i3 = 1 and 
11/ — 9\\b > £• We have that 

A/XH/l +ffi||s 1 ," - , ||/n+flVi||B„) < A/"(||/i|| Bl + Hffills!,-- - ,11/nbn + \\9n\\B n ) 

< AT(||/i|| Bl , • • • MfnUJ+NdlgilW,--- ,\\g n \\Bn)- 

As all the norms on M. n are equivalent, J\f is continuous on WL , and vectors x G satisfying A/"(|x|) = 1 
form a compact subset in W 1 . We also recall that N is strictly increasing with respect to each of its 
variables and \x\ — > M(\x\) is a strictly convex norm on M n . We conclude from these two facts and 
the above equation that B is uniform convex if there exists some positive constant e' independent of 
/, g such that 

maxlll/jll^ + Wdjhj ~ \\fj +9j\\B 3 ■ j G N n } > e 1 

or 

max {|ll/jll^ - \\9j\\Bj\ ■ j G N n } > e' . 



13 



Assume to the contrary that such a positive constant does not exist. It implies that for all p > 0, 
there exists f,g G B that satisfy ||/ — <7||b > e and 

Wfjh, + \\gj\\Bj ~ Wfj+djhj < P, \WSjWBi ~ IbllB.I < P for all j G N„. 

Again, as any two norms on M. n are equivalent, the inequality ||/ — g\\ B > s implies that \\fk — gk\\l3 k > 
eo > for some k G N n and some positive constant Eo independent of /, g. The conclusion is that 
there exists some k G N n and some positive constants M, eq > such that for all P > 0, there exists 
u,v £ Bk such that ||u||g fe < M, ||v||e fe < M and 



\U - v\\ Bk > £ , 



\ v \\B k \ < ft, IMlE fe + \\v\\B k ~ \\u + v\\ Bk < P- 



(4.5) 



We shall show that the above equation contradicts the uniform convexity of Bt- We may choose 
P so small that P < £q/4. It follows from the first two inequalities of (j4.5|) that 



£0 



Hts k > 



eo 



(4.6) 



To proceed, we estimate that 



\u\\B k 



\v\\B k 



\U\\B k 



+ 



V 



v 



Bk W u \\B k 
1 

\m\B k 

> e - P > 3eo 
~ ||u|| Bfe ~ 4M' 



m\B k \\v\\B k 
i 



v \\B k 



m\B k 



M\B k 



By the uniform convexity of Bk , there exists a positive constant 5 dependent on Eo,M and the space 
Bk only such that 

u v 



+ 



\B k 



< 2-5. 



(4.7) 



B k 



Finally, we get by (US}, and gZD that 

+ lkllB fe - ll« + "Hflfc = \\u\\ Bk + \\v\\ Bk 



\B k 



+ 



+ 



\B k 



\B k 



> \\u\\ Bk + IMk - ( 2 - S)\\u\\ Bk - \\u\\ Bk \\v\\ Bk 



> \\u\\s k + \\v\\ Bk + 1 IMk 
>*Nk>^, 



M\B k 



5)\\u 



\B k 
1 



\B k 



\\ u \\B k 
B k 



\ v \\B k 



which contradicts to the third inequality of (|4.5p as /3 can be arbitrarily small. 
It is clear that B* = {(/* : j G N n ) : / G B} with the norm 

Similar arguments to those above prove that B* is uniformly convex. By the fact (see [TTJ) that 
a Banach space is uniformly Frechet differentiable if and only if its dual is uniformly convex, B is 
uniform. □ 
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We next identify the reproducing kernel of B with the following norm 

i/p 



3=1 

Let the output space C n be equipped with the norm of and let Kj be the reproducing kernel of Bj, 
j € N n . The unique compatible semi-inner product on B is given by 

1 n 

\\9\\b 3=1 

The duality mapping on B is hence of the form 

/• : ('' ll V :j€N n V /GB. (4.8) 
To find an expression for (K(x, •)£)* for a; 6 X and £ € C n , we deduce that 



It follows that 



(*0v)£)* = f^E^(«i(x,-)r :JGN„V xGX,UC n . (4.9) 



By equations (|4~8|) and (|4~9]) . 

/ n \ V« 



and 



K{x,y)Z 



( / \ ^P" 1 ) 



4.3 Translation invariant vector-valued RKBS 

An C n -valued RKBS B on R d is said to be translation invariant if translations are isometric on B, 
namely, if for each / € B and x G M d , /(• + x) G £> and ||/(- + x)\\g = \\fWs- It was proved in [35] 
that a scalar-valued RKHS is translation invariant if and only if its reproducing kernel is of the form 
ip{x — y) for some scalar-valued function ip. For the Banach space case, as a reproducing kernel alone 
does not determine its RKBS, we do not have such a characterization. Our purpose in this subsection 
is to construct a class of translation invariant vector-valued RKBS by the Fourier transform. 

Denote by L l (R d ) the Banach space of Lebesgue measurable functions / on M d equipped with the 
norm 

L'(R rf ) := / \f( x )\ dx - 
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<p(t) := 1_ - I if(x)e- ix4 dx, t G m 



For ip G L l (M. d ), its Fourier transform ip and inverse Fourier transform ip are respectively given by 

(V2TT) d 

and 

(p{t) := i- - / (p^e^dx, t G ra 



27r) d 

Here x • t is the standard inner product on M. d . 

To start the construction, we let (j> be a nonnegative function in L 1 (M ci ) with J" Rd (j>{x)dx = 1 and 
denote by L p (M. d , d<j)), p G (1, +00), the Banach space of Lebesgue measurable functions / on M. d with 
the norm ^ 

I : = |/(x)| p 0(2;) ( ix^ < +00. 

The feature space W is chosen as 

W := { u = (m, . . . ,« : G L p (R d , j G N n } 

endowed with the norm 



p 

'Jll£J>( 
J'=l 



Its dual space W* is given by 

W* = {w = (wi, . . . , m„) : ^ G L q (R d , j G N n } 

with the norm 
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J'=l 



1/'/ 



The bilinear form on W x W* is 

n „ 

(n,w)w=> / Uj(x)wj(x)(j)(x)dx, u G W, it; G W* 



Moreover, the dual element of u G W is 

IP-2 



By Proposition 14. H W is a uniform Banach space. Our feature map <3? : WL d — > £(W,C n ) is then 
defined by 

§(x)u := S(u(f>)Xx), x£R d , u G W, 

where 5 is an invertible n x n matrix and := ((ujcp)" : j G N n ). The map $ is well-defined as 

fcj) G for all / G L p (R d , d(p) by the Holder inequality. We also notice that Q(x) is continuous 

from W to C n for each x G M d by the fact that 

ICW0*0l < \\f<t>h^) < \\f\\LP(R*,d4) for all / G L p (l 
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One sees that the adjoint operator <£* : M. d — > £(C n ,W*) is given by 



'(x)( V ) = -^-S T r ] , x G M. d , V eC n . 

Clearly, the denseness condition (j3.5j) is satisfied. The equivalent condition (j3.2|) hence holds true. We 
obtain by Corollary 13.21 that 

B := {f u := S(u<f>y : u G W} 
with the norm := an d compatible semi-inner product 

j (x)|u i (x)| p - 2 (/>(x)(ix 



1 /" 

[s(u<f>y,s(v<i>y]B = [u,v\ w = V 2 / ^(x^o) 

= 1 l^llvv 



is a C"- valued RKBS. It is translation invariant because for all y G ~K d and u G W 

||5(n0H- + y)|| B = ^(e-^^llB = ||e-f*u||w = ||u|| W = II^MIb- 
To understand the reproducing kernel of B, we present the dual space of B 

B* = {S(u*4>y : u G W} 
with the norm, compatible semi-inner product and bilinear form 

\\s(u*4>y\\B* = \\u*\\w*, [s(u*4>y,s(v*4>y]B* = [v,u\ w , (s(u^y,s(v*(j)y) B = (u,v*) w . 

With these preparations, we identify by (|2.6|) that 

(K(x,-)0* = S(v* x ^y, xel d , ?eC, 

where 



v * (t):=-^-S T C, 



and ^* is the dual element of £ in C n under a strictly convex norm. By the above two equations, 
(K(x, -)0*(y) = jJ=^- d SS T ekx - V), x, y G R d , £ G C n . 



We also derive that 

K(x,y)£= q SI 1 g :j€N w x,y £ R , £ G 



(V2^) d \ \(S T ^*)j)\^ =T 



We remark that when p = 2, C n is endowed with the standard Euclidean norm || • ||, and (j> is the 
Gaussian function, K becomes the Gaussian kernel for C n -valued RKHS 

K(x,y) = SS*exp (- l|x ~^ 2 ), x,y£R d , 

which confirms the validity of the above construction. 
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5 Multi-task Learning with Banach Spaces 



We discuss the applications of vector-valued RKBS to the learning of vector-valued functions from 
finite samples. Specifically, suppose that the unknown target function is from the input space X to 
an output space A and the observations of the function on given sampling points {xj : j G N m } C X 
are available. The observation at xj, j G N m could be f(xj) or the application of some continuous 
linear functional in A* on f(xj). And it is usually corrupted by noise in practice. To handle the noise 
and have a good generalization error, we shall follow the regularization methodology. For notational 
simplicity, let x := (xj : j G N m ) G X m and /(x) := (f(xj) : j G N m ) G A m . A general learning 
scheme has the following form 

inf Q(/(x))+A*(||/|| B ), (5.1) 

where B is a chosen A- valued RKBS on X, Q : A rn — > M + is a loss function, A is a positive regularization 
parameter, and \& : R+ — > R+ is called a regularizer. We are concerned with the existence and 
uniqueness, representation, and solving of the minimizer of (|5.ip . Before moving on to these topics, 
let us see some examples of learning schemes of the form (|5.1[) : 

— Regularization networks 

m 

Q(/(x)) := £ \\f( Xj ) - OllL := ll/Hl, (5.2) 

i=i 

where G A, j G N m are observed outputs of / at x. In general, one may use 

Q(/(x)) = P(||/(xi) - 611a, • • • , \\f(x m ) ~ Uh), (5.3) 

where P is a function from — >■ M+. A particular choice of P leads to the support vector 
machine regression. 

— Support vector machine regression 

m 

A := R n , Q(/(x)) = max (°> 11/(^0 " " £ )> 
i=i 

where e is a positive constant standing for the tolerance level. 

- Spectral learning: when B is the space of sensing matrices introduced in the last section with a 
unitarily invariant matrix norm, (|5.ip is the special spectral learning considered in [2]. 

5.1 Existence and Uniqueness 

The weak topology is the weakest topology on a Banach space V such that elements in V* remain 
continuous on V. A sequence u n G V, n G N, is said to converge weakly to uq G V if for each /i G V*, 
/i(u n ) converges to h(uq). We call a regularizer ^ : R + — > M + admissible if it is continuous and 
nondecreasing on IR + with 

lim ^(t) = +oo. (5.4) 

Proposition 5.1 If Q : A m — > M + is continuous with respect to each of its variables under the weak 
topology on A and \& is an admissible regularizer then Ii5. 1\) has at least a minimizer. 
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Proof: Arguments similar to those in the proof of Proposition 4 in [37] still apply to the vector- valued 
case considered here. □ 

When A is finite-dimensional, any two topologies on it are equivalent. Thus, continuity under the 
weak topology is equivalent to continuity with respect to the norm of A. 

Corollary 5.2 Let B be finite- dimensional. If Q : A m — > IR + is continuous with respect to each of its 
variables and ^ is an admissible regularizer then ( [5. 1\) has at least a minimizer. 



We next deal with the case when the loss function has the form (15.3 



Proposition 5.3 If P : IR™ — > M + is continuous on and nondecreasing with respect to each of its 
variables and the regularizer *S> is admissible then 



inf P(||/(xi) - £i||a, • • • , \\f(x m ) - UWa) + A*(||/|| B ) (5.5) 



has a minimizer. 
Proof: Set 

£(f) := P(\\f(xx) - 611a, • • • , ||/(x m ) - U\\a) + A*(||/|| B ), f€B. 

and £o := infj g g£(/). Using the arguments similar to those in [37], we can find a sequence f n G B, 
n G N that is weakly convergent to some /o G B, and some a > such that ||/o||e < a and ||/ n ||B < a 
for all n € N. Moreover, for any e > there exists some N £N such that for n > N, 

*(||/n||B)>*(||/o||s)-e. (5.6) 

Since f n converges weakly to /o, by (|'2.6p 

}™J.fn( x j) ~ €jifa(xj) - £,-]a = [fo(xj) - £j, fo{xj) - ^-]a for all j G N m . 

It implies by the Cauchy-Schwartz inequality of semi-inner products that for any 5 > there exists 
some N' £N such that for n > N' 

\\fn(xj) - Ob > ||/o(xj) " \\B ~ 5 for all j G N m . (5.7) 

Since 

\\fo(xj) - Cjh, \\fn(xj) - Ob < max{a\\5 x .\\ C (js iA ) + ||^||a : j G N m } 

and \l/ is uniformly continuous on compact subsets of and is nondecreasing with respect to each 
of its variables, we get by (|5.7p that 

P{\\fn{xi) ~ II A, • • • , \\fn{x m ) ~ CmlU) > -f( 1 1 /o (^l) ~ 6IU, " " " , ll/o^m) ~ CmlU) ~ £ 

for sufficiently large n. This combined with (|5.6p proves that /q is a minimizer of ()5.5|) . □ 

For uniqueness of the minimizer, we have the following routine result. 

Proposition 5.4 If Q is convex on A m and \l/ is strictly increasing and strictly convex then \5.1\) has 
at most one minimizer. 
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Proof: It is straightforward that the function mapping / G B to Q(/(x)) + A^(||/||b) is strictly convex 
on B. □ 



We close this subsection with the following corollary to the above propositions. 

Corollary 5.5 Let B be a A-valued RKBS on X. Then inf/- g s£(/) has a unique minimizer for the 
following choices of regularization functionals: 

in 

£(/)=£ll/fo)-^ + A||/|fe, PG[1,+oo), re(l,+oc), 
m 

f(/) = ^max(0 ) ||/(^)-^||A-£) + A||/||^ r€(l,+oo), e > 0. 
i=i 

5.2 The representer theorem 

We study the representation of the minimizer of (|5.1|) by the reproducing kernel if of 5. The result, 
known as the representer theorem in the scalar- valued and vector- valued RKHS cases, was due to [21] 
and [25] . respectively. For more references on this subject for the RKHS case, see [U [28] and the 
references cited therein. We established the representer theorem for scalar- valued RKBS in |36[ 137] . 
The representer theorem is closely related to the minimal norm interpolation. We start with examining 
the latter problem. 

Let x := (xj : j G N m ) G X m be a fixed set of sampling points. Denote for each z := (rjj : j G 
N m ) G A m by I z the set of functions / G B that satisfy the interpolation condition /(x) = z. We 
need two notations for the proof of the representer theorem for the minimal norm interpolation. For a 
subset A of Banach space V, A 1 - stands for the set of all the continuous linear functionals on V that 
vanish on A, and for B C V*, L B := {u G V : fi(u) = for all fj, G B}. 

Lemma 5.6 Let z G A m . Ifl z is nonempty then the minimal norm interpolation problem 

inf{||/|| B :/G J z } (5.8) 

has a unique minimizer. A function /o G B is the minimizer of 15. 8\) if and only if /(x) = z and 

fS G span {(K( Xj , ■)£)* : j G N m , £ G A} . (5.9) 

Proof: Clearly, I z is a closed convex subset of B. A minimizer of (|5.8[) is the best approximation in 
I z to the origin of B. It is well-known that a closed convex subset in a uniform convex Banach 
space has a unique best approximation to a point in the same space. By this fact, (|5.8p has a unique 
minimizer. It is also trivial that /q G I z is the minimizer if and only if 

\\fo + g\\s > II/oIIb for all g G X . 

By the characterization of best approximation by the semi-inner product established in [16j . the above 
equation holds if and only if 

b)/o] = for all g G Iq, 
which can be equivalently expressed as /q G (^o)" 1 - Note that g G 1q if and only if 

[g,K( Xj , -)£} B = \g(.Xj),Z] A = for all j G N m and (GA, 
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which is equivalent to that 

g£ ± {(K(x j ,-)0* :jeN m , (GA}. 
We conclude that /o G I z is the minimizer of (|5.8|) if and only if 

By the Hahn-Banach theorem, for each B G B*, (^B) 1 - = spanS. The proof is hence complete. □ 
The above lemma enables us to prove the main result of the section without much effort. 

Theorem 5.7 Suppose that h5.1\) has at least a minimizer. If the regularizer is nondecreasing then 
\5.1\) has a minimizer that satisfies 15. 9\) . If ^ is strictly increasing then every minimizer of h5.1\) 
must satisfy \5. 9\) . 

Proof: Let / G B be a minimizer of (|5.ip . We let /o be the minimizer of 

min{||g|| B : g 6l /(x) }. (5.10) 

Then H/oIIb < II/Hb and / (x) = /(x). It follows that Q(/ (x)) = Q(/(x)) while *(||/ob) < *(||/||b) 
as ^ is nondecreasing. Therefore, /o is a minimizer of (|5.1|) . By Lemma 15.61 /o satisfies (|5.9|) . 

Suppose that \& is strictly increasing and f £ B does not satisfy (|5.9p . Again, we let /o £ B be the 
minimizer of (|5.1U|) . As / does not satisfy (|5.9|) . / 7^ /o by Lemma ESI Thus, ||/||b > II /oils- The 
consequence is that while Q(/(x)) = Q(/o(x)), ^(H/Hs) > ^(||/o||s) because ^ is strictly increasing. 
Therefore, / can not be the minimizer of (15 .If) . The proof is complete. □ 

5.3 Characterization equations 

We consider the solving of the regularized learning scheme (|5.ip in this subsection. We try to make use 
of the representer theorem. To this end, we note that the output space A is usually finite-dimensional 
in practice. Let us assume that (|5.ip has a unique minimizer /o, dim(A) = n < +00, and {ej" : I G N n } 
is a basis for B*. In this case, we see by property (12.16P of the reproducing kernel K that /o has the 
form 

m 

fZ = Y,(K( Xj , Or/,-)* (5.11) 
j'=i 

for some rjj £ A, j £ N m . It hence suffices to find the finite model parameters r/j's in order to obtain 
/o- To this end, one may substitute (15. lip into (|5.ip to convert the original minimization problem in 
a potentially infinite-dimensional Banach space into one about the finitely many parameters m 's. We 
next show how the reformulation can be done under the finite-dimensionality assumption on A. As 
each £ G A is uniquely determined by {[£, e;]A : i G N n }. We may rewrite the regularization functional 
as 

nunW(([/(^), ei ] A : j G N m , / G N n )) + A*(||/|| B ) (5.12) 
for some function ft : C mxn ■ -> K+. By (g3J) and ([231) 

[/(0),e«U = O^B = [(^(xj.Oejr./V. 
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For the regularizer part, we have by (|2.2|) that ||/||b = ||/*||b*- Therefore, the parameters rjj's in 
(|5.1ip are the minimizer of 



min 1Z 

r€A m 



(K(x j ,-)e l y,J2(K(x k , 



k=l 



j G N m , I G N n + A* 



3=1 



Unlike the RKHS case, the above minimization problem is usually non-convex with respect to r* or Tj 
even when 1Z and Vf are both convex. The reason is that a semi-inner product is generally non-additive 
with respect to its second variable. 

In some occasions, one is able to derive a characterization equation for the minimization problem 
(15. ip . which together with the representer theorem constitutes a powerful tool in converting the mini- 
mization into a system of equations about the model parameters in the representer theorem. We shall 
derive characterization equations for the particular example of f)5. 1[) 



(5.13) 



minX>(ll/(^)-6-||A) + A*(||/|| B ), 



f<=B 



where £j stands for the observation of the target function at Xj for j G N m , and ip is a chosen loss 
function from M + to M+. We shall assume that both ip and VP are continuously differentiable and 



u/(t) 
lim I-Ai = 0. 

t->o+ t 



(5.14) 



For convenience, we make the convention that 0/0 := 0. The next two results hold for any A regardless 
of its dimension. 

Theorem 5.8 Let ^ and ip be continuously differentiable on M + with \5. 1J$ . A function /o / is 
the minimizer of A5.13\) if and only if 



aS^ * + e 



^(ll/o(^)-Ol 



(K( Xj ,-)(f ( Xj )-^)r =o. 



(5.15) 



(5.16) 



roiis ^ Wfo(xj) -CjWb 

The zero function is the minimizer of i5.13\) if and only if 

\\T\\ B . < A*'(0), 
where 

Proof: The proof is similar to that for the scalar- valued RKBS case in |37j . One only needs to handle 
the semi-inner product in vector-valued RKBS carefully. □ 

In the sequel, we discuss the application of the above theorem to the regularization networks 

m 

minE||/(^)-^lli + A||/|||. (5.17) 
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To this end, we say that the point evaluations on B at Xj, j G N m are essentially linearly independent 
if for all rjj £ A, j £ N m 

m 

Y J [f(xj),r 1j } A = 0for all/GB 

i=i 

necessitates that rjj = for each j G N m . By (|2.6p . , j G N m are essentially linearly independent if 
and only if 

m 

i=i 

implies that 77 j = for each j G N m . 

Corollary 5.9 Suppose that the point evaluations on B at xj, j G N m are essentially linearly inde- 
pendent. Then fo is t/ie minimizer of the regularization network { 5.17\ ) if and only if it is of the form 
\5.11\) where the parameters i]j 's satisfy 

H + fo (xj ) - $ = for all j G N m . (5.18) 

Proof: For the regularization network (j5. 17|) . (|5.15p and (|5.16p are equivalent to each other when 
/o = 0. By Theorem 15. 8[ fo is the minimizer of (I5.17P if and only if 

m 

A/ * + J2 ( K (*i, -)(/o(^) " 0))* = 0. (5.19) 
i=i 

Thus, fo has the form (|5.11|) . Since 5 Xj , j G N m are essentially linearly independent, (|5.19p is equivalent 
to that the parameters r^-'s in (|5.1ip satisfy (|5.18p . The proof is complete. □ 

Similarly, one may substitute the representer theorem into the characterization equations (|5.15|) and 
(|5.18p to reduce the minimization problem to the solving of a system of equations about the parameters 
r/j's. Again, due to the non-additivity of a semi-inner product with respect to its second variable, the 
resulting equations are generally nonlinear about the parameters. We conduct the reformulation when 
A is of finite dimension n G N and {e*f : I G N^} forms a basis for A*. In this case, (|5.18|) can be 
reformulated as 



{K(x j} -)ei)*,^2(K(x k , -)r) k y 



k=l 



= [£j,ei], j G N m , I G N n . 

B* 



We shall leave the solving of the resulting non-convex minimization problem and nonlinear equations 
about the parameters in the representer theorem for future study. 
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